71 research outputs found

    PBCS : Efficient Exploration and Exploitation Using a Synergy between Reinforcement Learning and Motion Planning

    Full text link
    The exploration-exploitation trade-off is at the heart of reinforcement learning (RL). However, most continuous control benchmarks used in recent RL research only require local exploration. This led to the development of algorithms that have basic exploration capabilities, and behave poorly in benchmarks that require more versatile exploration. For instance, as demonstrated in our empirical study, state-of-the-art RL algorithms such as DDPG and TD3 are unable to steer a point mass in even small 2D mazes. In this paper, we propose a new algorithm called "Plan, Backplay, Chain Skills" (PBCS) that combines motion planning and reinforcement learning to solve hard exploration environments. In a first phase, a motion planning algorithm is used to find a single good trajectory, then an RL algorithm is trained using a curriculum derived from the trajectory, by combining a variant of the Backplay algorithm and skill chaining. We show that this method outperforms state-of-the-art RL algorithms in 2D maze environments of various sizes, and is able to improve on the trajectory obtained by the motion planning phase

    The problem with DDPG: understanding failures in deterministic environments with sparse rewards

    Full text link
    In environments with continuous state and action spaces, state-of-the-art actor-critic reinforcement learning algorithms can solve very complex problems, yet can also fail in environments that seem trivial, but the reason for such failures is still poorly understood. In this paper, we contribute a formal explanation of these failures in the particular case of sparse reward and deterministic environments. First, using a very elementary control problem, we illustrate that the learning process can get stuck into a fixed point corresponding to a poor solution. Then, generalizing from the studied example, we provide a detailed analysis of the underlying mechanisms which results in a new understanding of one of the convergence regimes of these algorithms. The resulting perspective casts a new light on already existing solutions to the issues we have highlighted, and suggests other potential approaches.Comment: 19 pages, submitted to ICLR 202

    Une estimation de la cible implicite d’inflation dans la zone euro

    Get PDF
    Estimation of the Implicit Inflation Target in the Euro Area. Euro area countries as a whole have experienced a marked downward trend over the 1980s. Over this period, the unemployment rate has increased and economic activity has been sluggish. Changes in the implicit inflation target, viewed as low frequency movements of inflation, might possibly explain these developments. To highlight this issue, the present study estimates the dynamics of the implicit inflation target in the euro zone over the period 1970-2004. Based on a small macroeconometric model, the implicit target, not known by the econometrician, is identified through a minimal set of theoretical restrictions : (i) the inflation target is a non stationary process, (ii) inflation is a monetary phenomenon in the long-run, and (iii) changes in the implicit target have no long-run effects whatsoever on real variables. The model is estimated so as to match output growth, changes in inflation and the ex post real interest rate. Our main results are : (i) inflation target shocks account for the bulk of nominal fluctuations ; (ii) due to monetary policy inertia and nominal stickiness, changes in the target generate large swings in the real interest rate translating into substantial short-run effects on real variables ; (ii) in spite of this inflation target shocks moderately impact on output dynamics.Le taux d'inflation dans l’ensemble des pays qui constituent la zone euro a fortement diminué au cours des années 1980. Durant cette période, le taux de chômage a significativement augmenté et l'activité économique a notablement ralenti dans la zone. Les variations de la cible implicite d’inflation des banques centrales, assimilées aux mouvements de basse fréquence de l’inflation, sont potentiellement un facteur d’explication de ces évolutions. Afin d’éclairer cette question, cette étude propose d’estimer les évolutions dynamiques de la cible implicite d’inflation dans la zone euro au cours de la période 1970(1)-2004(4) à l’aide d’une petite maquette macroéconométrique. Cette cible implicite d’inflation, inobservée de l’économiste, est identifiée à l’aide d’un ensemble minimal de restrictions théoriques : (i) la cible implicite d’inflation suit un processus non stationnaire (ii) l’inflation est un phénomène exclusivement monétaire à long terme et (iii) les variations de la cible implicite d’inflation n’ont pas d’effet réel à long terme. Le modèle est estimé de façon à reproduire le taux de croissance du PIB, la variation de l’inflation et le taux d’intérêt réel ex post. Les résultats principaux qui se dégagent de l’analyse empirique sont les suivants : (i) les chocs sur la cible implicite d’inflation expliquent l’essentiel des fluctuations des variables nominales, même à court terme ; (ii) leurs effets réels à court terme transitent par l’inertie du taux d’intérêt nominal et la forte hausse du taux d’intérêt réel qui en découle ; (iii) en dépit de cela, ces chocs n’affectent que modestement la dynamique du PIB.Sahuc Jean-Guillaume, Matheron Julien, Fève Patrick. Une estimation de la cible implicite d’inflation dans la zone euro. In: Revue française d'économie, volume 24, n°2, 2009. pp. 39-56

    A Pitfall with DSGE-Based, Estimated, Government Spending Multipliers

    Get PDF
    This paper examines issues related to the estimation of the government spending multiplier (GSM) in a Dynamic Stochastic General Equilibrium context. We stress a potential source of bias in the GSM arising from the combination of Edgeworth complementarity/substitutability between private consumption and government expenditures and endogenous government expenditures. Due to crossequation restrictions, omitting the endogenous component of government policy at the estimation stage would lead an econometrician to underestimate the degree of Edgeworth complementarity and, consequently, the long-run GSM. An estimated version of our model with US postwar data shows that this bias matters quantitatively. The results prove to be robust to a number of perturbations
    • …
    corecore